Discovering patterns to extract protein-protein interactions from the literature: Part II

نویسندگان

  • Hao Yu
  • Xiaoyan Zhu
  • Minlie Huang
  • Ming Li
چکیده

MOTIVATION An enormous number of protein-protein interaction relationships are buried in millions of research articles published over the years, and the number is growing. Rediscovering them automatically is a challenging bioinformatics task. Solutions to this problem also reach far beyond bioinformatics. RESULTS We study a new approach that involves automatically discovering English expression patterns, optimizing them and using them to extract protein-protein interactions. In a sister paper, we described how to generate English expression patterns related to protein-protein interactions, and this approach alone has already achieved precision and recall rates significantly higher than those of other automatic systems. This paper continues to present our theory, focusing on how to improve the patterns. A minimum description length (MDL)-based pattern-optimization algorithm is designed to reduce and merge patterns. This has significantly increased generalization power, and hence the recall and precision rates, as confirmed by our experiments. AVAILABILITY http://spies.cs.tsinghua.edu.cn.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovering Patterns to Extract Protein-Protein Interactions from Full Biomedical Texts

Although there have been many research projects to extract protein pathways, most such information still exists only in the scientific literature, usually written in natural languages and defying data mining efforts. We present a novel and robust approach for extracting protein-protein interactions from the literature. Our method uses a dynamic programming algorithm to compute distinguishing pa...

متن کامل

Discovering Domains Mediating Protein Interactions

Background: Protein-protein interactions do not provide any direct information re‌garding the domains within the proteins that mediate the interactions. The majority of proteins are multi domain proteins and the interaction between them is often defined by the pairs of their domains. Most of the former studies focus only on interacting do‌main pairs. However they do not consider the in...

متن کامل

Discovering patterns to extract protein-protein interactions from full texts

MOTIVATION Although there are several databases storing protein-protein interactions, most such data still exist only in the scientific literature. They are scattered in scientific literature written in natural languages, defying data mining efforts. Much time and labor have to be spent on extracting protein pathways from literature. Our aim is to develop a robust and powerful methodology to mi...

متن کامل

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

Evaluation of the Effect of the 47 kDa Protein Isolated from Aged Garlic Extract on Dendritic Cells

Objective(s)  Garlic (Allium sativum) is known as a potent spice and a medicine with broad therapeutic properties ranging from antibacterial to anticancer, and anticoagulant. One of the major purified garlic protein components is the 47 kDa protein. In this study, the effect of 47 kDa protein extracted from aged garlic (AGE) was evaluated on mouse dendritic cell (DC) maturation in vitro. Mate...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 21 15  شماره 

صفحات  -

تاریخ انتشار 2005